Improving the accuracy of pronunciation lexicon using Naive Bayes classifier with character n-gram as feature: for language classified pronunciation lexicon generation

نویسندگان

  • Aswathy P. V
  • Arun Gopi
  • T. Sajini
  • Bhadran V. K
چکیده

This paper looks at improving the accuracy of pronunciation lexicon for Malayalam by improving the quality of front end processing. Pronunciation lexicon is an in evitable component in speech research and speech applications like TTS and ASR. This paper details the work done to improve the accuracy of automatic pronunciation lexicon generator (APLG) with Naive Bayes classifier using character n-gram as feature. ngram is used to classify Malayalam native words (MLN) and Malayalam English words (MLE). Phonotactics which is unique for Malayalam is used as the feature for classification of MLE and MLN words. Native and nonnative Malayalam words are used for generating models for the same. Testing is done on different text input collected from news domain, where MLE frequency is high.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier

With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...

متن کامل

Pronunciation dependent language models

Speech recognition systems are conventionally broken up into phonemic acoustic models, pronouncing dictionaries in terms of the phonemic units in the acoustic model and language models in terms of lexical units from the pronouncing dictionary. Here we explore a new method for incorporating pronunciation probabilities into recognition systems by moving them from the pronouncing lexicon into the ...

متن کامل

Combining linguistic knowledge and acoustic information in automatic pronunciation lexicon generation

This paper describes several experiments aimed at the long term goal of enabling a spoken conversational system to automatically improve its pronunciation lexicon over time through direct interactions with end users and from available Web sources. We selected a set of 200 rare words from the OGI corpus of spoken names, and performed several experiments combining spelling and pronunciation infor...

متن کامل

Improving mispronunciation detection and diagnosis of learners' speech with context-sensitive phonological rules based on language transfer

This study demonstrates how knowledge of language transfer can enable a computer-assisted pronunciation teaching (CAPT) system to effectively detect and diagnose salient mispronunciations in second language learners’ speech. Our approach uses a HMM-based speech recognizer with an extended pronunciation lexicon that includes both a model pronunciation for each word and common pronunciation varia...

متن کامل

Tool for Czech Pronunciation Generation Combining Fixed Rules with Pronunciation Lexicon and Lexicon Management Tool

This paper presents two different tools which may be used as a support of speech recognition. The tool “transc” is the first one and it generates the phonetic transcription (pronunciation) of given utterance. It is based mainly on fixed rules which can be defined for Czech pronunciation but it can work also with specified list of exceptions which is defined on lexicon basis. It allows the usage...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014